Information Approach to Co-occurrence of Words in Written Language

نویسنده

  • Damián G. Hernández
چکیده

In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as cooccurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iranian EFL Learners’ Reactions to Different Feedbacks in Writing Classrooms: Teacher Written Comments (TWC) vs. Peer Written Comments (PWC)

The teaching of writing has recently begun to move away from a concentration on the written product to an emphasis on the process of writing. Feedback is a fundamental element of the process approach to writing. It can be defined as input from a reader to a writer with the effect of providing information to the writer for a revision. This study reports on the effectiveness of two types of feedb...

متن کامل

Drawing Word co-occurrence map of Spinal Muscular Atrophy disease

Introduction:  The purpose of this article is to evaluate the status of articles in the field of Spinal Muscular Atrophy According to the Scientometrics indices Word co-occurrence map of this field . Methods: The present study is an applied one with a quantitative approach and a descriptive approach. It has been done using scientometrics and the co-occurrence words analysis technique. Document...

متن کامل

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Complex Systems

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2015